在时间序列预测的背景下,常用做法是评估多种方法,并选择其中一种方法或用于产生最佳预测的合奏。然而,在多种方法中选择不同的集合仍然是当方法的数量增加时,仍然是经历组合爆炸的具有挑战性的任务。在需求预测或收入预测的背景下,这一挑战在大量时间序列以及由于不断变化的业务环境而获得的有限的历史数据点,进一步加剧。虽然深入学习预测方法旨在同时预测大量时间序列,但由于有限的历史可用,可能不会产生理想的结果,它们变得挑战。我们提出了一种通过在使用交叉验证的潜在时间序列上组合低级时间矩阵分解和最佳模型选择来预测短高维时间序列数据的框架。我们展示预测潜在因子与直接应用于时间序列的不同UNI变化模型相比,潜在因子导致显着的性能提升。在M4月数据集的截断版本上验证了性能,其中包含来自来自多个域的时间序列数据,显示该方法的一般适用性。此外,由于在将预测方法直接应用于高维数据集时通常是不切实际的潜在因子而言,可以将未来的分析师视图纳入未来的分析师观。
translated by 谷歌翻译
今天消费者提供的各种数字付款选择是过去十年的电子商务交易的关键驱动因素。不幸的是,这也升起了网络犯罪分子和欺诈者,通过部署日益复杂的欺诈攻击,在这些系统中不断寻找漏洞。典型的欺诈检测系统采用标准的监督学习方法,重点是最大化欺诈召回率。但是,我们认为这种配方可以导致次优的解决方案。这些欺诈型号的设计要求要求它们对数据中的高级不平衡具有强大,适应欺诈模式的变化,维持欺诈率与下降率之间的平衡,以最大限度地提高收入,并可均可用于异步反馈由于通常在交易和欺诈意识之间存在显着的滞后。为实现这一目标,我们将欺诈检测作为奖励功能中模型内的实用性最大化作为顺序决策问题。历史下降率和欺诈率定义了由批准或拒绝交易的二进制动作空间的系统状态。在这项研究中,我们主要关注实用的最大化并探索此目的不同的奖励功能。已经使用深度Q-Learning进行了两种公开的欺诈数据集,并与不同的分类器相比,已经评估了拟议的欺诈数据集。我们的目标是在未来的工作中解决其余问题。
translated by 谷歌翻译
使用诸如BERT,ELMO和FLAIR等模型建模上下文信息的成立具有显着改善了文字的表示学习。它还给出了几乎每个NLP任务机器翻译,文本摘要和命名实体识别的Sota结果,以命名为少。在这项工作中,除了使用这些主导的上下文感知的表示之外,我们还提出了一种用于命名实体识别(NER)的知识意识表示学习(KARL)网络。我们讨论了利用现有方法在纳入世界知识方面的挑战,并展示了如何利用我们所提出的方法来克服这些挑战。 KARL基于变压器编码器,该变压器编码器利用表示为事实三元组的大知识库,将它们转换为图形上下文,并提取驻留在内部的基本实体信息以生成用于特征增强的上下文化三联表示。实验结果表明,使用卡尔的增强可以大大提升我们的内部系统的性能,并在三个公共网络数据集中的文献中的现有方法,即Conll 2003,Conll ++和Ontonotes V5实现了比文献中现有方法的显着更好的结果。我们还观察到更好的概括和应用于从Karl上看不见的实体的真实环境。
translated by 谷歌翻译
近年来的自然语言处理研究(NLP)在培训大型模型中,目睹了用于产生上下文感知语言表示的巨大增长。在这方面,许多NLP系统利用了基于神经网络的架构的力量来结合在嵌入中的感觉信息,从而产生了上下文化的单词嵌入式(CWE)。尽管有了这一进展,但NLP社区并未见证任何关于这种架构的上下文化力量的比较研究。本文提出了对比较研究和对九个广泛采用的变压器模型进行了广泛的分析。这些型号是BERT,CTRL,DISTOLBERT,OPENAI-GPT,OPENAI-GPT2,Transformer-XL,XLNET,Electra和Albert。我们使用两个词汇样本字消歧(WSD)任务,SENDSVAL-2和SENDSVAL-3评估它们的上下文化力量。我们采用了在CWE上使用K-Collegy邻(KNN)分类的WSD简单但有效的方法。实验结果表明,拟议的技术还在WSD任务中实现了最新的最先进的结果
translated by 谷歌翻译
This paper presents a comprehensive survey of low-light image and video enhancement. We begin with the challenging mixed over-/under-exposed images, which are under-performed by existing methods. To this end, we propose two variants of the SICE dataset named SICE_Grad and SICE_Mix. Next, we introduce Night Wenzhou, a large-scale, high-resolution video dataset, to address the issue of the lack of a low-light video dataset that discount the use of low-light image enhancement (LLIE) to videos. The Night Wenzhou dataset is challenging since it consists of fast-moving aerial scenes and streetscapes with varying illuminations and degradation. We conduct extensive key technique analysis and experimental comparisons for representative LLIE approaches using these newly proposed datasets and the current benchmark datasets. Finally, we address unresolved issues and propose future research topics for the LLIE community.
translated by 谷歌翻译
We investigate data-driven texture modeling via analysis and synthesis with generative adversarial networks. For network training and testing, we have compiled a diverse set of spatially homogeneous textures, ranging from stochastic to regular. We adopt StyleGAN3 for synthesis and demonstrate that it produces diverse textures beyond those represented in the training data. For texture analysis, we propose GAN inversion using a novel latent domain reconstruction consistency criterion for synthesized textures, and iterative refinement with Gramian loss for real textures. We propose perceptual procedures for evaluating network capabilities, exploring the global and local behavior of latent space trajectories, and comparing with existing texture analysis-synthesis techniques.
translated by 谷歌翻译
Recent advances in deep learning research, such as transformers, have bolstered the ability for automated agents to generate creative texts similar to those that a human would write. By default, transformer decoders can only generate new text with respect to previously generated text. The output distribution of candidate tokens at any position is conditioned on previously selected tokens using a self-attention mechanism to emulate the property of autoregression. This is inherently limiting for tasks such as controllable story generation where it may be necessary to condition on future plot events when writing a story. In this work, we propose Future Sight, a method for finetuning a pretrained generative transformer on the task of future conditioning. Transformer decoders are typically pretrained on the task of completing a context, one token at a time, by means of self-attention. Future Sight additionally enables a decoder to attend to an encoded future plot event. This motivates the decoder to expand on the context in a way that logically concludes with the provided future. During inference, the future plot event can be written by a human author to steer the narrative being generated in a certain direction. We evaluate the efficacy of our approach on a story generation task with human evaluators.
translated by 谷歌翻译
Transformer-based models have gained large popularity and demonstrated promising results in long-term time-series forecasting in recent years. In addition to learning attention in time domain, recent works also explore learning attention in frequency domains (e.g., Fourier domain, wavelet domain), given that seasonal patterns can be better captured in these domains. In this work, we seek to understand the relationships between attention models in different time and frequency domains. Theoretically, we show that attention models in different domains are equivalent under linear conditions (i.e., linear kernel to attention scores). Empirically, we analyze how attention models of different domains show different behaviors through various synthetic experiments with seasonality, trend and noise, with emphasis on the role of softmax operation therein. Both these theoretical and empirical analyses motivate us to propose a new method: TDformer (Trend Decomposition Transformer), that first applies seasonal-trend decomposition, and then additively combines an MLP which predicts the trend component with Fourier attention which predicts the seasonal component to obtain the final prediction. Extensive experiments on benchmark time-series forecasting datasets demonstrate that TDformer achieves state-of-the-art performance against existing attention-based models.
translated by 谷歌翻译
Boundary conditions (BCs) are important groups of physics-enforced constraints that are necessary for solutions of Partial Differential Equations (PDEs) to satisfy at specific spatial locations. These constraints carry important physical meaning, and guarantee the existence and the uniqueness of the PDE solution. Current neural-network based approaches that aim to solve PDEs rely only on training data to help the model learn BCs implicitly. There is no guarantee of BC satisfaction by these models during evaluation. In this work, we propose Boundary enforcing Operator Network (BOON) that enables the BC satisfaction of neural operators by making structural changes to the operator kernel. We provide our refinement procedure, and demonstrate the satisfaction of physics-based BCs, e.g. Dirichlet, Neumann, and periodic by the solutions obtained by BOON. Numerical experiments based on multiple PDEs with a wide variety of applications indicate that the proposed approach ensures satisfaction of BCs, and leads to more accurate solutions over the entire domain. The proposed correction method exhibits a (2X-20X) improvement over a given operator model in relative $L^2$ error (0.000084 relative $L^2$ error for Burgers' equation).
translated by 谷歌翻译
Training a neural network requires choosing a suitable learning rate, involving a trade-off between speed and effectiveness of convergence. While there has been considerable theoretical and empirical analysis of how large the learning rate can be, most prior work focuses only on late-stage training. In this work, we introduce the maximal initial learning rate $\eta^{\ast}$ - the largest learning rate at which a randomly initialized neural network can successfully begin training and achieve (at least) a given threshold accuracy. Using a simple approach to estimate $\eta^{\ast}$, we observe that in constant-width fully-connected ReLU networks, $\eta^{\ast}$ demonstrates different behavior to the maximum learning rate later in training. Specifically, we find that $\eta^{\ast}$ is well predicted as a power of $(\text{depth} \times \text{width})$, provided that (i) the width of the network is sufficiently large compared to the depth, and (ii) the input layer of the network is trained at a relatively small learning rate. We further analyze the relationship between $\eta^{\ast}$ and the sharpness $\lambda_{1}$ of the network at initialization, indicating that they are closely though not inversely related. We formally prove bounds for $\lambda_{1}$ in terms of $(\text{depth} \times \text{width})$ that align with our empirical results.
translated by 谷歌翻译